{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# **Reproduction of climate ambition scores**\n",
        "\n",
        "- The climate ambition scores cannot be exactly reproduced as they are based on LLM classification via API (using the gpt-4o model). However, the code provided here could be run and should lead to very similar results.\n",
        "- To rerun the full code, replicators would need to have an API key for openai to run the classification and note API costs would be involved!\n",
        "\n",
        "- Otherwise, the code below provides checkpoints, where the classified data could be loaded and inspected for the interest groups, the media articles and partisan statements respectively.\n",
        "\n"
      ],
      "metadata": {
        "id": "2AO6ZC3fKjCi"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "**Data requirements:**\n",
        "\n",
        "  - Interst group data with topic label: Annotationen_interest_groups.xlsx\n",
        "  - Interest group data with topic and climate labels: interest_df_topic_climate_labels.feather\n",
        "\n",
        "  - Media data with topic label: media_df_topic_lables.feather\n",
        "  - Media data with topic and climate labels: media_df_topic_climate_labels.feather\n",
        "\n",
        "  - Partian data with topic label: parties_df_topic_labels.feather\n",
        "  - Partian data topic and climate labels: parties_df_topic_climate_labels.feather\n"
      ],
      "metadata": {
        "id": "7VW-wcXlPp8w"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Mounting the drive"
      ],
      "metadata": {
        "id": "U8LR-yXFUlmc"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "s_WTxQhcUlBz",
        "outputId": "dc0873b9-890b-4ddc-e8d2-a4b4ed2138f4"
      },
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Mounted at /content/drive\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 1.Interest groups"
      ],
      "metadata": {
        "id": "Gg2Z76OsKmrA"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OB_DQj7wO7Rt"
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "\n",
        "# Load the interst group file | full text not sentence level\n",
        "interest_df = pd.read_excel('/content/drive/Annotationen_interest_groups.xlsx')"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Number of documents\n",
        "num_docs = len(interest_df)\n",
        "\n",
        "# Convert date column to datetime if not already\n",
        "interest_df[\"Date\"] = pd.to_datetime(interest_df[\"Date\"], errors=\"coerce\")\n",
        "\n",
        "# Earliest and latest date\n",
        "earliest_date = interest_df[\"Date\"].min()\n",
        "latest_date = interest_df[\"Date\"].max()\n",
        "\n",
        "# Print results\n",
        "print(f\"Number of documents: {num_docs}\")\n",
        "print(f\"Earliest date: {earliest_date.date()}\")\n",
        "print(f\"Latest date: {latest_date.date()}\")"
      ],
      "metadata": {
        "id": "s4N6NzoJKreJ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install openai --quiet"
      ],
      "metadata": {
        "id": "F3aH04acKtVa"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import openai\n",
        "\n",
        "openai.api_key = \"key\"  # Here teh secret openai API key needs to be used."
      ],
      "metadata": {
        "id": "TCNcKXl4KyO9"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import openai\n",
        "\n",
        "# Set your API key\n",
        "openai.api_key = \"key\"  # Replace with your actual key\n",
        "\n",
        "def classify_climate_ambition(text):\n",
        "    prompt = f\"\"\"\n",
        "You are an expert in German environmental and housing policy, focusing on climate mitigation in the heating and housing sector.\n",
        "\n",
        "Given the following text, please evaluate its level of climate mitigation ambition on a scale from 1 to 5, where:\n",
        "\n",
        "- **-2** = Strong opposition to climate mitigation (e.g., rejecting regulations, downplaying climate issues in housing/heating)\n",
        "- **-1** = Moderate or implicit criticism of climate mitigation (e.g., emphasizing burdens, slowing down policies)\n",
        "- **0** = Neutral or no mention of climate mitigation in heating/housing\n",
        "- **1** = Moderate support (e.g., positive mention of mitigation goals or technologies without strong commitment)\n",
        "- **2** = Strong support (e.g., endorsing ambitious policies, clear commitment, or implementation details)\n",
        "\n",
        "\n",
        "Text:\n",
        "\\\"\\\"\\\"{text}\\\"\\\"\\\"\n",
        "\n",
        "Please provide only the numeric score (1-5) and a one-sentence explanation specific to heating and housing climate mitigation ambition in Germany.\n",
        "\"\"\"\n",
        "\n",
        "    try:\n",
        "        response = openai.chat.completions.create(\n",
        "            model=\"gpt-4o\",  # or \"gpt-4\" or \"gpt-4o-mini\" if preferred\n",
        "            messages=[{\"role\": \"user\", \"content\": prompt}],\n",
        "            temperature=0.0,\n",
        "            max_tokens=100,\n",
        "        )\n",
        "        return response.choices[0].message.content.strip()\n",
        "\n",
        "    except Exception as e:\n",
        "        print(f\"Error during classification: {e}\")\n",
        "        return None"
      ],
      "metadata": {
        "id": "ppXcVXARK-bI"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [],
      "metadata": {
        "id": "hoyvc5QTLDKa"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "interest_df = pd.read_feather('/content/drive/interest_df_topic_climate_labels.feather')"
      ],
      "metadata": {
        "id": "iIdsJBsrNVxs"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Extract the score from model output (e.g., \"1 – Explanation\")\n",
        "interest_df[\"ambition_score\"] = interest_df[\"climate_ambition\"].str.extract(r'(-?\\d)').astype(float)"
      ],
      "metadata": {
        "id": "t0XM6bvCNXyW"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "interest_df.groupby(\"Akteur\")[\"ambition_score\"].mean().sort_values().plot(kind=\"barh\", title=\"Average Climate Ambition by Newspaper\")"
      ],
      "metadata": {
        "id": "jI2_GD6ZNZ_X"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "akteur_to_type = {\n",
        "    \"AWO\": \"Social (AWO, Caritas, Diakonie, Paritätischer)\",\n",
        "    \"BEE\": \"Renewable Energy (BEE)\",\n",
        "    \"GDW\": \"Owners (GDW, ZIA, BID, HuG)\",\n",
        "    \"DMB\": \"Tenants (DMB)\",\n",
        "    \"Paritätische\": \"Social (AWO, Caritas, Diakonie, Paritätischer)\",\n",
        "    \"ZIA\": \"Owners (GDW, ZIA, BID, HuG)\",\n",
        "    \"Caritas\": \"Social (AWO, Caritas, Diakonie, Paritätischer)\",\n",
        "    \"GDV\": \"Home Insurances (GDV)\",\n",
        "    \"DStGB\": \"Cities (DST)\",\n",
        "    \"Diakonie\": \"Social (AWO, Caritas, Diakonie, Paritätischer)\",\n",
        "    \"VKU\": \"Communal Energy (VKU)\",\n",
        "    \"DST\": \"Cities (DST)\",\n",
        "    \"BID\": \"Owners (GDW, ZIA, BID, HuG)\",\n",
        "    \"HuG\": \"Owners (GDW, ZIA, BID, HuG)\",\n",
        "    \"BDEW\": \"Energy Providers (BDEW)\",\n",
        "    \"AWO Bundesverband\": \"Social (AWO, Caritas, Diakonie, Paritätischer)\",\n",
        "    \"Klima-Allianz Deutschland AWO Bundesverband Deutscher Caritasverband Diakonie Deutschland Paritätischer Gesamtverband\": \"Social (AWO, Caritas, Diakonie, Paritätischer)\",\n",
        "    \"Diakonie; BEE; Caritas; NABU; Paritätische; DMB\": \"Environment (NABU)\"\n",
        "}\n"
      ],
      "metadata": {
        "id": "tnDAXlLqNbtZ"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "interest_df[\"type\"] = interest_df[\"Akteur\"].map(akteur_to_type)"
      ],
      "metadata": {
        "id": "5_uMIjy_NdXT"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "interest_df.groupby(\"type\")[\"ambition_score\"].mean().sort_values().plot(\n",
        "    kind=\"barh\",\n",
        "    title=\"\",\n",
        "    figsize=(10, 6),\n",
        "    color=\"seagreen\"\n",
        ")\n",
        "\n",
        "plt.xlim(-2, 2)  # Set x-axis range from -2 to +2\n",
        "plt.xlabel(\"Durchschnittlicher Ambitionswert (-2 = Starke Ablehnung, +2 = Starke Zustimmung)\")\n",
        "plt.tight_layout()\n",
        "plt.grid(True, axis='x', linestyle='--', alpha=0.5)\n",
        "plt.show()\n"
      ],
      "metadata": {
        "id": "W2-gVRE2NfAg"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 2.Media"
      ],
      "metadata": {
        "id": "kYHjvMz1LHLm"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import pandas as pd\n",
        "\n",
        "# Load the interst group file | full text not sentence level\n",
        "media_df = pd.read_feather('/content/drive/media_df_topic_lables.feather')"
      ],
      "metadata": {
        "id": "DVMlb9PYLg1e"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Number of documents\n",
        "num_docs = len(media_df)\n",
        "\n",
        "# Convert date column to datetime if not already\n",
        "media_df[\"date\"] = pd.to_datetime(media_df[\"date\"], errors=\"coerce\")\n",
        "\n",
        "# Earliest and latest date\n",
        "earliest_date = media_df[\"date\"].min()\n",
        "latest_date = media_df[\"date\"].max()\n",
        "\n",
        "# Print results\n",
        "print(f\"Number of documents: {num_docs}\")\n",
        "print(f\"Earliest date: {earliest_date.date()}\")\n",
        "print(f\"Latest date: {latest_date.date()}\")"
      ],
      "metadata": {
        "id": "gNQVx3OdLs_e"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import openai\n",
        "\n",
        "openai.api_key = \"keyA\"  # Or better, load from environment variable or Colab secret"
      ],
      "metadata": {
        "id": "7Bh_6bc7LvlD"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import openai\n",
        "\n",
        "# Set your API key\n",
        "openai.api_key = \"key\"  # Replace with your actual key\n",
        "\n",
        "def classify_climate_ambition(text):\n",
        "    prompt = f\"\"\"\n",
        "You are an expert in German environmental and housing policy, focusing on climate mitigation in the heating and housing sector.\n",
        "\n",
        "Given the following text, please evaluate its level of climate mitigation ambition on a scale from 1 to 5, where:\n",
        "\n",
        "- **-2** = Strong opposition to climate mitigation (e.g., rejecting regulations, downplaying climate issues in housing/heating)\n",
        "- **-1** = Moderate or implicit criticism of climate mitigation (e.g., emphasizing burdens, slowing down policies)\n",
        "- **0** = Neutral or no mention of climate mitigation in heating/housing\n",
        "- **1** = Moderate support (e.g., positive mention of mitigation goals or technologies without strong commitment)\n",
        "- **2** = Strong support (e.g., endorsing ambitious policies, clear commitment, or implementation details)\n",
        "\n",
        "\n",
        "Text:\n",
        "\\\"\\\"\\\"{text}\\\"\\\"\\\"\n",
        "\n",
        "Please provide only the numeric score (1-5) and a one-sentence explanation specific to heating and housing climate mitigation ambition in Germany.\n",
        "\"\"\"\n",
        "\n",
        "    try:\n",
        "        response = openai.chat.completions.create(\n",
        "            model=\"gpt-4o\",  # or \"gpt-4\" or \"gpt-4o-mini\" if preferred\n",
        "            messages=[{\"role\": \"user\", \"content\": prompt}],\n",
        "            temperature=0.0,\n",
        "            max_tokens=100,\n",
        "        )\n",
        "        return response.choices[0].message.content.strip()\n",
        "\n",
        "    except Exception as e:\n",
        "        print(f\"Error during classification: {e}\")\n",
        "        return None"
      ],
      "metadata": {
        "id": "BclMWo6_Ly4I"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import time\n",
        "import pandas as pd\n",
        "import re\n",
        "\n",
        "# Optional: adjust this path to your needs\n",
        "checkpoint_path = \"/content/filtered_media_with_climate_ambition.feather\"\n",
        "\n",
        "# Only re-classify if not already done\n",
        "if \"climate_ambition\" not in filtered_media.columns:\n",
        "    filtered_media[\"climate_ambition\"] = \"\"\n",
        "\n",
        "# Loop over rows\n",
        "for idx, row in filtered_media.iterrows():\n",
        "    if pd.isna(row[\"climate_ambition\"]) or row[\"climate_ambition\"] == \"\":\n",
        "        try:\n",
        "            result = classify_climate_ambition(row[\"text\"])\n",
        "            filtered_media.at[idx, \"climate_ambition\"] = result\n",
        "\n",
        "            # Save every 10 rows as checkpoint\n",
        "            if idx % 10 == 0:\n",
        "                filtered_media.to_feather(checkpoint_path)\n",
        "                print(f\"Saved checkpoint at row {idx}\")\n",
        "\n",
        "            time.sleep(1.2)  # Sleep to respect OpenAI rate limits\n",
        "\n",
        "        except Exception as e:\n",
        "            print(f\"Error at row {idx}: {e}\")\n",
        "            time.sleep(5)\n",
        "\n",
        "# Final save\n",
        "filtered_media.to_feather('/content/drive/media_df_topic_climate_labels.feather')\n",
        "print(\"Finished classification and saved full results.\")"
      ],
      "metadata": {
        "id": "XPu3LBrmL0w1"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "filtered_media = pd.read_feather('/content/drive/media_df_topic_climate_labels.feather')"
      ],
      "metadata": {
        "id": "YXGZwJQsMKEv"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Number of documents\n",
        "num_docs = len(filtered_media)\n",
        "\n",
        "# Convert date column to datetime if not already\n",
        "filtered_media[\"date\"] = pd.to_datetime(filtered_media[\"date\"], errors=\"coerce\")\n",
        "\n",
        "# Earliest and latest date\n",
        "earliest_date = filtered_media[\"date\"].min()\n",
        "latest_date = filtered_media[\"date\"].max()\n",
        "\n",
        "# Print results\n",
        "print(f\"Number of documents: {num_docs}\")\n",
        "print(f\"Earliest date: {earliest_date.date()}\")\n",
        "print(f\"Latest date: {latest_date.date()}\")"
      ],
      "metadata": {
        "id": "twn0fhWrMMEi"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Extract the score from model output (e.g., \"1 – Explanation\")\n",
        "filtered_media[\"ambition_score\"] = filtered_media[\"climate_ambition\"].str.extract(r'(-?\\d)').astype(float)"
      ],
      "metadata": {
        "id": "BhYG0XtzMOSi"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "filtered_media.groupby(\"newspaper\")[\"ambition_score\"].mean().sort_values().plot(kind=\"barh\", title=\"Average Climate Ambition by Newspaper\")"
      ],
      "metadata": {
        "id": "r8hqFU23MTnD"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# Ensure numeric ambition score is extracted\n",
        "filtered_media[\"ambition_score\"] = filtered_media[\"climate_ambition\"].str.extract(r'(-?\\d)').astype(float)\n",
        "\n",
        "# Group by newspaper and calculate average ambition score\n",
        "avg_scores = (\n",
        "    filtered_media\n",
        "    .groupby(\"newspaper\")[\"ambition_score\"]\n",
        "    .mean()\n",
        "    .sort_values()\n",
        ")\n",
        "\n",
        "# Plot\n",
        "plt.figure(figsize=(10, 6))\n",
        "avg_scores.plot(kind=\"barh\", color=\"mediumseagreen\")\n",
        "plt.xlabel(\"Average Climate Ambition Score\")\n",
        "plt.title(\"Climate Ambition in Heating/Housing by Newspaper (Only predicted_label==1)\")\n",
        "plt.axvline(0, color='gray', linestyle='--', linewidth=1)  # line at neutral\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "O8TVPtJcMVWY"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# Normalize newspaper names to lowercase for grouping\n",
        "filtered_media[\"newspaper_clean\"] = filtered_media[\"newspaper\"].str.lower()\n",
        "\n",
        "# Extract numeric score again just in case\n",
        "filtered_media[\"ambition_score\"] = filtered_media[\"climate_ambition\"].str.extract(r'(-?\\d)').astype(float)\n",
        "\n",
        "# Group and plot\n",
        "avg_scores = (\n",
        "    filtered_media\n",
        "    .groupby(\"newspaper_clean\")[\"ambition_score\"]\n",
        "    .mean()\n",
        "    .sort_values()\n",
        ")\n",
        "\n",
        "# Plot\n",
        "plt.figure(figsize=(10, 6))\n",
        "avg_scores.plot(kind=\"barh\", color=\"mediumseagreen\")\n",
        "plt.xlabel(\"Average Climate Ambition Score\")\n",
        "plt.title(\"Climate Ambition in Heating/Housing by Newspaper (Only predicted_label==1)\")\n",
        "plt.axvline(0, color='gray', linestyle='--', linewidth=1)\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "j_VqDhV7MXMK"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "\n",
        "# Normalize and regroup newspaper names\n",
        "name_mapping = {\n",
        "    \"sz\": \"SZ (Print and Online\",\n",
        "    \"sz online\": \"SZ (Print and Online\",\n",
        "    \"der spiegel\": \"Spiegel (Print and Online)\",\n",
        "    \"spiegel online\": \"Spiegel (Print and Online)\",\n",
        "    \"faz\": \"FAZ (Print and Online)\",\n",
        "    \"faz.net\": \"FAZ (Print and Online)\",\n",
        "    \"welt\": \"WELT (Print and Online)\",\n",
        "    \"welt online\": \"WELT (Print and Online)\",\n",
        "    \"bild bund\": \"Bild (Bund, Plus, Online)\",\n",
        "    \"bild.de\": \"Bild (Bund, Plus, Online)\",\n",
        "    \"bild plus\": \"Bild (Bund, Plus, Online)\",\n",
        "    \"focus online\": \"Focus (Online)\",\n",
        "    \"taz\": \"TAZ (Print)\"\n",
        "}\n",
        "\n",
        "# Clean and remap newspaper names\n",
        "filtered_media[\"newspaper_clean\"] = (\n",
        "    filtered_media[\"newspaper\"]\n",
        "    .str.lower()\n",
        "    .replace(name_mapping)\n",
        ")\n",
        "\n",
        "# Extract numeric ambition score (if not already numeric)\n",
        "filtered_media[\"ambition_score\"] = filtered_media[\"climate_ambition\"].str.extract(r'(-?\\d)').astype(float)\n",
        "\n",
        "# Group and calculate average score\n",
        "avg_scores = (\n",
        "    filtered_media\n",
        "    .groupby(\"newspaper_clean\")[\"ambition_score\"]\n",
        "    .mean()\n",
        "    .sort_values()\n",
        ")\n",
        "\n",
        "# Plot\n",
        "plt.figure(figsize=(10, 6))\n",
        "avg_scores.plot(kind=\"barh\", color=\"seagreen\")\n",
        "plt.xlabel(\"Durchschnittlicher Ambitionswert (-2 = Starke Ablehnung, +2 = Starke Zustimmung)\")\n",
        "plt.title(\"\")\n",
        "plt.axvline(0, color='gray', linestyle='--', linewidth=1)\n",
        "plt.xlim(-2, 2)  # Set x-axis range\n",
        "plt.tight_layout()\n",
        "plt.grid(True, axis='x', linestyle='--', alpha=0.5)\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "A9cirmITMaOV"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 3.Parties"
      ],
      "metadata": {
        "id": "KzGXSXQFNmzG"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "parties_df = pd.read_feather('/content/drive/parties_df_topic_labels.feather')"
      ],
      "metadata": {
        "id": "BFmNGNoFOaf4"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "filtered_parties = parties_df[parties_df[\"predicted_label\"] == 1].copy()"
      ],
      "metadata": {
        "id": "fTqIxrpwNqzY"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import time\n",
        "import pandas as pd\n",
        "import re\n",
        "\n",
        "# Optional: adjust this path to your needs\n",
        "checkpoint_path = \"/content/filtered_parties_with_climate_ambition.feather\"\n",
        "\n",
        "# Only re-classify if not already done\n",
        "if \"climate_ambition\" not in filtered_parties.columns:\n",
        "    filtered_parties[\"climate_ambition\"] = \"\"\n",
        "\n",
        "# Loop over rows\n",
        "for idx, row in filtered_parties.iterrows():\n",
        "    if pd.isna(row[\"climate_ambition\"]) or row[\"climate_ambition\"] == \"\":\n",
        "        try:\n",
        "            result = classify_climate_ambition(row[\"text\"])\n",
        "            filtered_parties.at[idx, \"climate_ambition\"] = result\n",
        "\n",
        "            # Save every 10 rows as checkpoint\n",
        "            if idx % 10 == 0:\n",
        "                filtered_parties.to_feather(checkpoint_path)\n",
        "                print(f\"Saved checkpoint at row {idx}\")\n",
        "\n",
        "            time.sleep(1.2)  # Sleep to respect OpenAI rate limits\n",
        "\n",
        "        except Exception as e:\n",
        "            print(f\"Error at row {idx}: {e}\")\n",
        "            time.sleep(5)\n",
        "\n",
        "# Final save\n",
        "filtered_parties.to_feather('/content/drive/parties_df_topic_climate_labels.feather')\n",
        "print(\"Finished classification and saved full results.\")"
      ],
      "metadata": {
        "id": "6hp2xbYfNsrk"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "filtered_parties = pd.read_feather('/content/drive/parties_df_topic_climate_labels.feather')"
      ],
      "metadata": {
        "id": "ydXeIJ1_NxUn"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Number of documents\n",
        "num_docs = len(filtered_parties)\n",
        "\n",
        "# Convert date column to datetime if not already\n",
        "filtered_parties[\"date\"] = pd.to_datetime(filtered_parties[\"date\"], errors=\"coerce\")\n",
        "\n",
        "# Earliest and latest date\n",
        "earliest_date = filtered_parties[\"date\"].min()\n",
        "latest_date = filtered_parties[\"date\"].max()\n",
        "\n",
        "# Print results\n",
        "print(f\"Number of documents: {num_docs}\")\n",
        "print(f\"Earliest date: {earliest_date.date()}\")\n",
        "print(f\"Latest date: {latest_date.date()}\")"
      ],
      "metadata": {
        "id": "mFa10ZCxNzVG"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "filtered_parties[\"ambition_score\"] = filtered_parties[\"climate_ambition\"].str.extract(r'(-?\\d)').astype(float)"
      ],
      "metadata": {
        "id": "DVXyCSlrN1KU"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "# Group by party and compute mean\n",
        "party_avg = (\n",
        "    filtered_parties.groupby(\"partei\")[\"ambition_score\"]\n",
        "    .mean()\n",
        "    .sort_values()\n",
        "    .reset_index()\n",
        ")\n",
        "\n",
        "# Plot\n",
        "plt.figure(figsize=(10, 6))\n",
        "sns.barplot(data=party_avg, x=\"ambition_score\", y=\"partei\", palette=\"coolwarm\")\n",
        "\n",
        "plt.title(\"Average Climate Ambition Score by Party (Heating/Housing Sector)\")\n",
        "plt.xlabel(\"Average Ambition Score (-2 = Opposition, +2 = Strong Support)\")\n",
        "plt.ylabel(\"Party\")\n",
        "plt.grid(True, axis='x', linestyle='--', alpha=0.5)\n",
        "plt.xlim(-2, 2)\n",
        "\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "BzSgoJTlN27d"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "# Count and mean per party\n",
        "party_stats = (\n",
        "    filtered_parties.groupby(\"partei\")[\"ambition_score\"]\n",
        "    .agg([\"mean\", \"count\"])\n",
        "    .sort_values(\"mean\")\n",
        "    .reset_index()\n",
        ")\n",
        "\n",
        "# Plot\n",
        "plt.figure(figsize=(10, 6))\n",
        "sns.barplot(data=party_stats, x=\"mean\", y=\"partei\", palette=\"coolwarm\")\n",
        "\n",
        "# Add count labels\n",
        "for i, row in party_stats.iterrows():\n",
        "    plt.text(\n",
        "        row[\"mean\"] + 0.05,  # slight offset to the right\n",
        "        i,  # vertical position\n",
        "        f'n={row[\"count\"]}',\n",
        "        va='center',\n",
        "        fontsize=10,\n",
        "        color='black'\n",
        "    )\n",
        "\n",
        "plt.title(\"Average Climate Ambition Score by Party\\n(Heating/Housing Sector)\")\n",
        "plt.xlabel(\"Average Ambition Score (-2 = Strong Opposition, +2 = Strong Support)\")\n",
        "plt.ylabel(\"Party\")\n",
        "plt.grid(True, axis='x', linestyle='--', alpha=0.5)\n",
        "plt.xlim(-2, 2)\n",
        "\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "Dob9VeWtN5QE"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "import matplotlib.pyplot as plt\n",
        "import seaborn as sns\n",
        "\n",
        "# Filter out unwanted parties\n",
        "filtered_parties_clean = filtered_parties[\n",
        "    ~filtered_parties[\"partei\"].isin([\"Werteunion\", \"Volt Deutschland\", \"Bündnis Deutschland\"])\n",
        "]\n",
        "\n",
        "# Group by party and compute mean\n",
        "party_avg = (\n",
        "    filtered_parties_clean.groupby(\"partei\")[\"ambition_score\"]\n",
        "    .mean()\n",
        "    .sort_values()\n",
        "    .reset_index()\n",
        ")\n",
        "\n",
        "# Plot\n",
        "plt.figure(figsize=(10, 6))\n",
        "sns.barplot(data=party_avg, x=\"ambition_score\", y=\"partei\", color=\"seagreen\")\n",
        "\n",
        "plt.title(\"\")\n",
        "plt.xlabel(\"Durchschnittlicher Ambitionswert (-2 = Starke Ablehnung, +2 = Starke Zustimmung)\")\n",
        "plt.ylabel(\"\")\n",
        "plt.grid(True, axis='x', linestyle='--', alpha=0.5)\n",
        "plt.xlim(-2, 2)\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "metadata": {
        "id": "6ziA5qm_N7RY"
      },
      "execution_count": null,
      "outputs": []
    }
  ]
}